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I  A  R  P  A 

BE  THE  FUTURE 


“Invests  in  high-risk/high-payoff  research  programs  that 
have  the  potential  to  provide  our  nation  with  an 
overwhelming  intelligence  advantage” 

http://www.iarpa.gov/ 
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Goal:  Validated,  early  detection  of 
technical  emergence 


Enable  reliable,  early  detection  of  emerging  scientific  and 
technical  capabilities  across  disciplines  and  languages  found 
within  the  full-text  content  of  scientific,  technical,  and  patent 
literature 


Focus  from  the  outset  on  English,  Chinese,  German, 

Japanese,  Russian,  Korean,  and  Spanish 

Novelty  ->  Discover  patterns  of  emergence  and  connections  between 
technical  concepts  at  a  speed,  scale,  and 
comprehensiveness  that  exceeds  human  capacity 

Usage  ->  Alert  analyst  of  emerging  technical  areas  with  sufficient 
explanatory  evidence  to  support  further  exploration 
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Worldwide  Scientific  and  Patent  Literature 


Publications  (by  Language)  and  Patents 


Growth  in  scientific  and  patent  literature 
is  estimated  at  800k  docs/month 


Patents  by  Language 


Other  Languages 

Korean  (cyan) 
German  (green) 
Russian  (orange) 
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FUSE  Approach 


Today,  ad  hoc  “technical  horizon  scanning” 
already  consumes  substantial  expert  time,  is 
narrowly  focused  on  a  small  number  of  topics,  and 
is  subject  to  limited  systematic  validation. 


Analysts  need  to  scan 
continually  for  signs  of 
technical  capability 
emergence. 


Today 

FUSE 

Manual 

Automatic 

Selected  coverage 

“Complete”  literature 
coverage 

Updated  infrequently 

Updated  monthly 

Months  to  produce 

(for  one  technical  area) 

24hrs  to  produce 
(for  all  technical  areas) 

Ad  hoc  evaluation  Formal  models  of 

emergence 


Complete,  Continual,  Unbiased 
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FUSE  Research  Thrusts 


All  teams  are  pursuing  all 
research  areas  in  parallel 


Theory  Development 


Indicator  Development 


Nomination  Quality - ; - ■ - 1 

- — n - ^Jallenge  Question  Nomination  I 


Evidence  Representation 


RNAi :  2006-201 0  :  CQ1 


Was  there  a  community  of  practice  around  RNAi  during  2006-2010? 
The  answer  is  YES,  with  a  confidence  of  72% 

Many  indicators  suggest  a  positive  answer  to  the  CQ,  especially  within  the  Coaul 

Coauthorship  Graph 

Click  for  detailed  view 

The  coauthorship  graph  for  RNAi  spans  520  authors,  and  it  has  the  properties  of  a  small-v 
communities.  It  is  a  fully  connected  network  with  a  high  clustering  coefficient  as  well  as  a 

Coauthorship  graph  indicators  are  the  most  powerful  when  determining  the  answer  for  CQ 
the  direction  of  a  positive  answer. 


RNAi :  Community  of  Practice 
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2001  -  2005 
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Practical  Application 
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FUSE  Theory  Exploration 
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What  is  technical  emergence? 


Current  Hypotheses 


•  A  concept  has  emerged  if  it  has  been 
accepted  by  others  within  and  beyond 
one’s  community.  -Columbia 

•  A  concept  is  emerging  when  its  actant 
network  is  increasing  in  robustness.  -BAE 

•  A  concept  has  emerged  when  evidence 
has  appeared  that  the  concept  is  new  and 
unexpected,  noticeable  and  growing. 

-Raytheon  BBN 

•  A  concept  is  emerging  when  it  is 
identifiable  by  its  own  practitioners, 
enables  a  capability  that  was  not 
achievable  previously,  and  persists.  -SRI 


Many  ways  to  probe 
technical  emergence 

•  Community  of  Practice 

•  Practical  Application 

•  Debates 

•  Alternative 

•  Acceptance 

•  Interdisciplinarity 

•  Attention  (Citation) 
Prediction 

•  Dominant  sub-topic 
within  set 

•  Commercial  Application 

•  Infrastructure 
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Text  to  Indicator  Implementation 


Challenge 

Questions 


Indicators 


Pattern 

Analysis 


Fused 

Features 


Features 


Sci  &  Tech 
Literature. 


New  classes  of  FUSE-related 
features  are  being  developed  and 
validated 

•  Community  response,  role  in 
community 

•  Topic  stability 

•  Rhetorical  stance  for  refs/citations 

•  Zoning  the  full-text  content 

•  Methods  moving  from  “the  topic”  to 
“used”  to  “mentioned” 

•  Quality  and  quantity  of  resources 
available  to  an  investigator  or 
inventor 

•  New  terms  introduction  and  adoption 
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Nomination  and  Evidence  Explanation 


RNAi  :  2006-201 0  :  CQ1 


Was  there  a  community  of  practice  around  RNAi  during  2006-2010? 
The  answer  is  YES,  with  a  confidence  of  72% 

Topic  Summary 

RNA  interference  (RNAi)  is  an  RNA-dependent  gene  silencing  process  within  living 
cells.  The  selective  and  robust  effect  of  RNAi  on  gene  expression  makes  it  a  valuable 
research  tool,  both  in  cell  culture  and  in  living  organisms. 

Justification  and  evidence  for  answer 


Ongoing  Work 


Many  indicators  suggest  a  positive  answer  to  the  C 
Coauthorship  Graph,  Time  Series  and  Funding  grc 


Coauthorship  Graph 

Click  for  detailed  view 


The  coauthorship  graph  for  RNAi  spans  520  authors,  and  i 
small-world  network  which  is  typical  of  real-world  communi 
network  with  a  high  clustering  coefficient  as  well  as  a  reasc 
coefficient. 


Coauthorship  graph  indicators  are  the  most  powerful  when 
CQ1 ,  and  in  this  case  their  values  strongly  point  in  the  dire 


Time  Series 

Click  for  detailed  view 


Time  series 

A  time  series  is  a  sequence  of  data  points  measured  at  successive  time  instants  spaced  at  uniform  time 
intervals  (in  our  case,  years).  Time  series  analysis  looks  at  the  way  that  various  functions  of  the  RDG  behave 
over  time. 


The  time  series  for  number  of 
papers  and  number  of  unique 
authors  have  very  high  slopes, 
indicating  a  lively  community  of 
.  practice  which  evolves  over  time, 
i  The  number  of  in-citations  and 
out-dtations  also  progress  visibly 
over  time. 


Time  series  indicators  are  very 
important  when  determining  the 
answer  for  CQ1 .  In  this  case,  they 
clearly  suggest  a  positive  answer. 


Time  series:  RNAi 
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FUSE  Validation  /  Metrics 


•  Guiding  the  development  of  theories  and  indicators  of  emergence 

-  “Emergence  Theory  Workshop  and  Peer  Review” 

•  Effective  identification,  prioritization  and  nomination  of  technical 
areas  as  compared  to  real  world  (e.g.,  experts,  case  studies, 
present  day  tests  for  both  positive  /  negative  examples) 

-  “Nomination  Quality” 

•  Evidence  provided  in  a  clear  and  humanly  usable  form 

-  “Evidence  Quality” 

•  System  to  perform  at  scale  across  multiple  languages 

-  “Computational  Efficiency”  and  “Multilingual  Performance” 

•  Measure  technical  emergence  from  “real  world”  viewpoint  that  is 
drawn  from  diverse  areas  of  scientific  inquiry  &  application 

-  “Case  studies  and  reference  baseline” 
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FUSEnet  -  Computational  Environment 


FUSEnet 

-  Government  system  hosted  by  Oak  Ridge 
National  Laboratory  (ORNL) 

-  Protected  unclassified  system  with  remote 
access  for  all  approved  users 

Current  Specifications  for  FUSEnet  1.0 

-  770  gigaFLOPS*  of  maximum  performance 

-  16  blade  servers,  each  with  6  cores,  totaling  192 
processors 

-  2  blade  servers  for  backup  (not  in  FLOPS  estimate) 

-  96  GBytes  of  RAM  per  server  for  a  total  of  1 .5 
TBytes 

-  260  TBytes  of  effective  disk  storage 

-  iSCSI  10  Gigabit  connectivity 

-  Virtualized  computing  space  through  VMware 

-  Access  to  Document  Repository  (DR)  through  iSCSI 

-  Access  and  control  policies  are  enforced  by  ORNL 

-  Call  Center  and  metrics  for  service  quality 

FUSEnet  2.0  Specifications  Pending 
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If  you  had  a  system  that  could 


(a)  reliably  identify  what  technical 
capabilities  are  emerging 

and 

(b)  provide  humanly  understandable 

evidence  explanation 

then 


how  would  you  use  this  tool? 
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How  might  an  IARPA  PM  use  FUSE? 


•  Idea  development  for  a  relevant  problem  or  problem  space 

-  Answer  Heilmeier  Questions  (http://www.iarpa.gov/ioin3.htm  ) 

-  Hype  versus  reality 

-  New  enabling  component  capabilities,  signs  of  potential  convergence 

-  Why  is  this  innovative?  Does  it  overlap  with  past  and  existing  efforts? 

-  Does  new  capability  or  convergence  of  capabilities  require  further 
investment  to  motivate  progress?  Optimal  emergence  state  for  an  R&D 
program  investment 

•  Program  impact  assessment 

-  Observable  impact  within  a  research  community,  across  research 
communities,  beyond  the  research  community? 

•  Are  these  focused  programs  yielding  novel  results  or  not? 

•  Is  application  development  potential  increasing  as  a  result  of  this  effort? 

•  Is  there  a  gap  between  funding  and  output? 

-  Which  areas  are  being  driven  by  a  focused  research  program?  Which 
ones  are  not  (but  still  have  a  vibrant  community  of  practice)?  Which 
ones  are  being  driven  by  many  research  programs? 
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Would  the  components  of  FUSE 
enable  a  new  capability 

or 

enhance  an  existing  workflow? 
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Potential  Component  Services 


Lineage  / 
Provenance  of 
Acquired  Data 


Data 

Transformation 
(e.g.,  PDF  to 
XML) 


XML  Enrichment 
(e.g.,  new 
extractions, 
zoning) 


Nomination, 
Prioritization  by 
Time  Period 


Evidence 
Explanation  and 
Justification  for 
Nomination 
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FUSE  Status  Update 


•  Five  year,  fundamental  research  program 

•  Teams  under  contract 
since  August  201 1 

-  BAE  Systems 

-  Columbia  University 

-  Raytheon  BBN 

-  SRI  International 


•  Formal  test  and  evaluation  to  begin  October  201 2, 
three  additional  rounds  of  formal  evaluation  scheduled 

-  Case  Studies,  Eight  Examples:  Tissue  Engineering,  Cold  Fusion,  RF 
Metamaterials,  DNA  Microarrays,  Genetic  Algorithms,  RNAi, 
Steganography,  Horizontal  Gene  Transfer 


FY 

11 

|  FY  12 

1  FYli  | 

FY  14 

FY  15 

FY16 

Q1 

Q2  Q3  Q4 

|Q1  CU 

Q3 

Q4| 

Q1 

Q2 

Qi 

Q4 

Q1 

Q2 

Q3 

Q4 

Q1 

Q2 

03 

Q4 

Phase  1  (IS  Months) 
Base  Period 

Evaluation 

Phase  2  (15  months) 
Option  Period  1 

Evaluation 

♦i 

Phase  2  (15  months) 
Option  Period  2 

Evaluation 

* 

Phase  3  (12  months) 
Option  Period  3 

Evaluation 

♦ 

Live  Demonstration 

Complete  Program 

♦ 
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The  FUSE  Team 


^Columbia  Engineering 

/  \  The  Fu  Foundation  School  of  Engineering  and  Applied  Science 


ILLINOIS 

UNIVERSITY  OF  ILLINOIS  AT  URBANA-CHAMPAIGN 


Georgia 

Tech 


BAE  SYSTEMS 


Raytheon 

BBN  Technologies 


UMASS 

sMU  AMHERST 

SciTech 
Strategies 

li-tlj  TA 

<*renn 

— /  University  of  Pr.NN'v  va  vr» 


A 

* 


Oak 

Ridge 

National  Laboratory 


MITRE 

IDA 


MJti/SEA 

WA «FMM  CENTCRS  ~ 


DAHLGHEN 


NORTHROP  GRUMMAN 


3  (+2)  large  businesses 
3  (+3)  small  businesses 
14  academic  orgs 
1  not-for-profit  org 
Many  data  vendors 
Plus  FFRDCs  &  gov  orgs 


Booz  I  Allen  I  Hamilton 


/ 


j?»a 


Avian/ 

ENGINEERING  LLC 


S  Tarragon 

CONSULTING  CORPORATION 
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A  Few  Unique  Qualities  of  FUSE 

•  Connecting  disparate  communities  with  lots  of  data  sows  the  seeds 
for  much  discovery 

•  Social  science,  emergence  theory,  Natural  Language  Processing  (NLP),  etc. 

•  Scientific,  Technical,  and  Patent  literatures  are  new  genres 

•  Most  Natural  Language  Processing  (NLP)  trained  on  news  wire 

•  Foreign  language  tokenization,  etc.,  too 

•  Novel  extrinsic  test  for  technical  emergence 

•  Measuring  detection  of  emergence  (S&T) 

•  Not  measuring  NLP  metric  X  (internal  measures) 

•  Challenging  scale,  but  tractable 

•  Concept  extraction,  within/cross-doc  linkages 

•  Currently  growing  on  order  800k/mo.  (sci  lit  +  patents),  FUSE  has  -75-80% 
filed  patents  &  -10%  sci  lit  (1980-2010) 

•  Automated  evidence  explanation  is  really  challenging,  but  exciting 


•  Synthesize  &  Scan  “Horizon,”  not  Search 
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Anticipated  Impact 


*  Scientific  &  Technical  Analysis  Impact 

-  Relevant,  timely,  and  bias-controlled  analytic  force  multiplier  to  maintain 
technical  vigilance,  across  all  disciplines  and  multiple  languages 

-  Discover  previously  unknown  emergence  signals  of  interest  at  speed, 
scale,  and  comprehensiveness  that  exceeds  human  capacity 

*  Technical  Impact 

-  Generalized  and  validated  theories  of  technical  emergence 

-  New  cross-document  conceptual  feature  extraction  technologies 

-  Progress  in  computer-generated  evidence  representations  for  human  use 

*  Secondary  Impact 

-  Improved  priority  filter  for  USG  investment  strategies  and  policy 
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Questions 


Dewey  Murdick,  Ph.D. 

FUSE  Program  Manager,  IARPA 
dewey.murdick@iarpa.gov 
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Finding  patterns  of  emergence 
in  science  and  technology 


Today,  the  identification  and  assessment  of  emerging  technical  capabilities  is  a  time-consuming,  domain- 
specific,  and  expert-intensive  process.  This  demanding  process  is  often  carried  out  under  severe  time 
constraints  on  either  too  much  or  too  little  data,  with  limited  reproducible  auditing  and  bias  controls,  and  with 
limited  systematic  validation  against  real  world  activities.  Furthermore,  the  increasing  globalization  of  science 
and  technology  raises  the  potential  for  high-impact  technical  capabilities  to  emerge  in  increasingly  diverse 
technical,  socio-economic,  and  geographic  areas. 


Analysts,  subject-matter  experts,  and  even  research  program  managers  benefit  from  a  reliable,  evidence- 


based  capabi 
labor  involved 


ity  that  allows  them  to  dramatically  accelerate  the  horizon-scanning  process  and  reduce  the 
to  identify  specific  emerging  technical  areas  in  context  for  in-depth  review.  The  Foresight  and 


-Pt 

Understanding  from  Scientific  Exposition  (FUSE)  Program  is  the  Intelligence  Advanced  Research  Projects 
Activity  (IARPA)  response  to  this  need. 


The  FUSE  Program  seeks  fundamental  advances  in  our  understanding  of  how  the  real-world  processes  of 
technical  emergence  leave  discernible  traces  in  the  public  scientific,  technical,  and  patent  literature,  and  how 
those  traces  can  be  detected,  fused,  and  prioritized.  FUSE  aims  to  develop  and  validate  a  comprehensive 
suite  of  quantitative  measures  of  technical  emergence  that  generalize  across  language,  culture  and  technical 
area.  Technology  developed  from  the  FUSE  Program  will  automatically  nominate  both  known  and  novel 
technical  areas  based  on  quantified  indications  of  technical  emergence  with  sufficient  supporting  evidence 
and  arguments  for  that  nomination. 


The  presentation  will  introduce  the  technical  approach  and  explore  the  potential  impact  of  technologies  and 
insights  that  may  emerge  as  a  result  of  the  FUSE  Program.  For  more  information,  see 

http://www.iarpa.gov/solicitations  fuse.html. 
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BACKUP 
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Foresight  and  Understanding  from  Scientific 
Exposition  (FUSE)  Program 


GOAL:  Enable  analysis  by  reliably  detecting 
emerging  scientific  and  technical  capabilities 
across  disciplines  and  languages  found  within 
the  full-text  content  of  scientific,  technical,  and 
patent  literature  (in  EN,  CN,  DE,  JP,  RU,  ...) 
WHY:  Analysts  need  effective  ways  to 
maintain  technical  vigilance  and  discover 
previously  unknown  capabilities  in  a  relevant, 
timely,  and  bias-controlled  analytic  manner 
HOW:  Develop  theories  and  indicators  of 
technical  emergence;  process  full-text 
literature  for  relevant  features;  identify, 
prioritize  &  nominate  high-priority  technical 
areas  and  provide  evidence  with 
understandable  explanations  for  analytic  use 
SUCCESS:  Systems  that  “speed  read”  100s  of 
millions  of  pages  of  technical  text  and  provide 
understandable  and  useful  evidence  that 
justifies  high-priority  alerts  of  emerging 
capabilities _ 


Case  Studies,  Eight  Examples: 

•Tissue  Engineering,  Cold  Fusion,  RF 
Metamaterials,  DNA  Microarrays,  Genetic 
Algorithms,  RNAi,  Steganography,  Horizontal 
Gene  Transfer 

Accomplishments: 

•  Currently  in  Phase  1  through  7  Feb  2013 

•  Four  research  teams  developing,  testing,  and 
validating  theories  of  emergence,  indicator 
efficacy,  nomination  precision  and  recall, 
evidence  clarity,  and  scalable  system 
performance 

Upcoming  milestones: 

•  July  2012  intermediate  system  test 

•  Oct  201 2  -  Jan  201 3  system  test  and 
evaluation  cycle 
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Specific  Phase  1  Goals 


•  Craft  viable  theories  /  hypotheses  of  technical  emergence 

•  Effectively  use  full-text  features  and  measure  impact  on  technical 
emergence 

•  Correctly  nominate  document  groups  that  exhibit  technical 
emergence  as  represented  by  challenge  questions,  time  periods 

-  Valid  nomination  extends  across  disciplines 

-  Satisfactory  prioritization  of  topics  over  time  periods 

•  Establish  a  reliable  measure  for  Evidence  Quality  (i.e.,  the  rubric) 
and  deliver  comprehensible  evidential  support  for  nomination 

•  Demonstrate  proof-of-concept  nomination  for  Chinese  and 
German  topics  with  realistic  progress  on  multilingual  components 

•  Demonstrate  system  functionality  that  establishes  confidence  that 
team  can  transition  to  Phase  2  (e.g.,  scalability,  minimize 
brittleness  across  disciplines  and  document  types) 
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Why  now? 


•  Important  problems  to  overcome: 

-  Need  to  learn  to  automatically  scan  for  emergence  (beyond  search) 

-  Too  much  information  to  analyze,  in  too  many  languages 

•  Support  strategic  investment 

•  Facilitate  discovery  and  innovation 

-  Cannot  reliably  query  for  patterns  that  indicate  emergence  without  starting  with  a 
known,  named  subject 

•  Automated  analysis  is  likely  to  work  because: 

-  The  scientific  literature  is  now  available  in  digital  formats 

•  Metadata  records  are  well  curated  and  ready  for  use 

•  Exploitation  of  the  full  text  of  documents  is  now  possible  (although  not  easy) 

-  Emerging  text  and  “signal”  analysis  (temporal  pattern)  techniques  are  promising 

•  Context-sensitive  feature  extraction  from  text 

•  Unsupervised  clustering 

•  Machine  learning 

•  Statistical  modeling 

•  Pattern  matching  and  analysis 

•  Indicator  development  and  validation 
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Program  Structure 


Phase 

(Period) 

Duration 

Primary  English  and  Multilingual  Goals 

Phase  1 

(Base 

Period) 

Aug  2011  - 
Feb  2013 
(18  mo) 

Demonstrate  that  full-text  literature  can  be  the  source  for  robust 
indicators  of  technical  emergence  within  a  consistent  theoretical 
construct.  Automatically  prioritize  a  small  number  of  provided  Related 
Document  Groups  (RDGs),  each  representing  a  single  technical  area. 
Nominate  those  RDGs  that  exhibit  technical  emergence. 

Demonstrate  proof-of-concept  functionality  in  at  least  two  lanauaaes  in 
addition  to  English. 

Phase  2 

(Option 

Periods 

1  &  2) 

Feb  2013- 
May  2014 
(15  mo) 

May  2014- 
Aug  2015 
(15  mo) 

Demonstrate  automatic  generation  and  nomination  of  those  RDGs  that 
exhibit  single  technical  area  emergence,  from  a  collection  of  millions  of 
full-text  documents. 

For  at  least  two  languages  in  addition  to  English,  automatically  prioritize 
provided  RDGs,  each  representing  a  single  technical  area.  Nominate  those 
RDGs  that  exhibit  technical  emergence. 

Phase  3 

(Option 

Period  3) 

Aug  2015- 
Aug  2016 
(12  mo) 

Demonstrate  automatic  generation  and  nomination  of  those  RDGs  that  exhibit 

technical  emergence  across  disparate  technical  areas,  from  a  collection 
of  millions  of  full-text  documents. 

For  at  least  two  languages  in  addition  to  English,  demonstrate  automatic 
generation  and  nomination  of  those  RDGs  that  exhibit  single  technical  area 
emergence,  from  a  collection  of  full-text  documents. 

INTELLIGENCE  ADVANCED  RESEARCH  PROJECTS  ACTIVITY  (IARPA) 


27 


Foresight  and  Understanding  from  Scientific  Exposition  (FUSE) 


OFFICE  OF  THE  DIRECTOR  OF  NATIONAL  INTELLIGENCE 


Leading  Intelligence  Integration 


Data  Files  Loaded  on  the  FUSEnet  “DR”  Per  Source 


Data  Source  (update)  Size  (GB) 


Elsevier  (yr) 


IEEE-ASPP  (opt) 


IEEE-POP  (opt) 


IEEE-TJ  (opt) 


Lexis-Nexis  -  CN 


LN-DE 


LN-EP 


LN-GB 


LN- JP 


LN-KR 


LN-RU 


LN-SU 


LN-US 


LN-WO 


LN  All  Update  (mo) 


Nature 


PUBMED  Central+  (opt) 


Scopus  (yr) 


SPIE 


BEEM 


#  Metadata  XML  #  Full-text  XML 
Files  Files 


264 


393 
1,136 


Total 


574 


12,528 


#  Full-Text 
ETL XML 
Files 


#  Full-text  PDF 
Files 


#  Image  Files 


3,825,122 


383,212 
1,908,581 


491 ,232 


483,102 


48,693,056 


88,627,434 


100,886,913 


738,860 


2,756,102 


26,095,450 
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How  do  we  probe  “Technical  Emergence”? 

Government-Defined  Challenge  Questions 

•  Was  there  a  community  of  practice  around  <concept>  during  <time  period>? 

•  Were  there  debates  within  the  scientific  community  on  <concept>  during  <time  period>? 

•  Was  there  a  demonstration  of  practical  application  of  <concept>  during  <time  period>? 

•  Was  <concept>  considered  an  alternative  to  an  established  concept  during  <time 
period>? 

•  Was  there  a  demonstration  of  commercial  application  of  <concept>  during  <time  period>  ? 

•  Was  the  infrastructure  required  to  perform  research  in  <concept>  readily  available  during 
<time  period>? 

Performer-Defined  Challenge  Questions 

•  Was  <concept>  accepted  during  <time  period>?  Columbia 

•  Did  the  acceptance  of  <concept>  increase  or  decrease  during  <time  period>?  Columbia 

•  How  interdisciplinary  was  the  scientific  and  technical  knowledgebase  around  <concept> 
during  <time  period>?  SRI 

•  Did  usage  of  new  terminology  describing  <concept>  increase  in  robustness  during  <time 
period>?  BAE 

•  How  many  citations  of  papers  from  <concept>  published  in  <time  period  1>  would  you 
expect  to  see  in  dime  period  2>?  Raytheon  BBN 

•  Does  the  <concept>  dominate  a  thread  during  dime  period>?  Raytheon  BBN 
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Case  Studies 


Drawn  from  diverse  areas  of  scientific  inquiry  & 
application: 

-  Biological  Sciences  /  Biotechnology 

-  Computer  Science  /  Information  Science;  Engineering 

-  Mathematics  /  Statistics 

-  Physical  Sciences;  Earth  Science 

-  Medical  /  Clinical  /  Infectious  Disease  /  Health  Services; 

-  Social  Sciences;  ... 

Technical  emergence  measured  from  “real  world”  view 
Doint,  but  connected  to  literature 

Multiple  case  studies  to  be  produced;  some  are  held 
back  for  evaluation 

-  Case  studies  are  representative  but  not  comprehensive 

-  Insufficient  to  train  technical  emergence  classifiers 

-  Limited  examples  of  emergence  &  non-emergence  (10s  planned) 

-  Reference  baseline  has  limited  temporal  resolution  (~5  year  blocks) 
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FUSE  Horizon  Scanning 


Evidence 


Nomina 

-tion 


Challenge 

Questions 


Indicators 


Pattern 

Analysis 


Fused 

Features 


Meaning 
of  Words 
&  Phrases 


Doc 

Structure 
&  Links 


Features 


Sci,  Tech, 
Informal  Lit. 


DR  + 
Structure 


Human- 

crafted 

RDGs 


Emergence  Queue(s)  for  Analysis 

Community  of 
Practice 

Practical 
Application  ** 

Performer  CQ 

♦ 

1 .  RDG  #  rwhv?i 

1 .  RDG  #  iwhv?i 

1 .  RDG  #  iwhv?i 

2.  RDG  #  rwhv?i 

2.  RDG  #  fwhv?i 

2.  RDG  #  iwhv?i 

.  .  . 

.  .  . 

.  .  . 

T 


Generic  Use  Case  (e.g.,  Alerts  and  Interactive 
Queue  Exploration  and  Analysis) 


[whv?1  ->  Eval  Interface  (e.g.,  Web-Browser) 

•  Descriptive  Text,  Indicator  template,  ... 
•Time  Series,  Network  Graphs,  ... 

•Text  Snippet,  Generated  Text,  Doc  Refs,  ... 
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A  D  I  N  G 


I  NTELLIGENCE  I  NTEGRATION 
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IARPA  Overview 


lARPA’s  mission  is  to  invest  in  high-risk/high-payoff  research  programs  that 
have  the  potential  to  provide  the  U.S.  with  an  overwhelming  intelligence 

advantage  over  our  future  adversaries 

•  CAVEAT:  HIGH-RISK/HIGH-PAYOFF  IS  NOT 
A  FREE  PASS  FOR  STUPIDITY. 

•  Bring  the  best  minds  to  bear  on  our  problems. 

-  World-class  Program  Managers  (PMs). 

•  IARPA  will  not  start  a  program 
without  a  good  idea  and  an 
exceptional  person  to  lead  its 
execution. 

-  Full  and  open  competition  to  the  greatest 
possible  extent. 

•  Cross-community  focus. 

-  Address  cross-community  challenges  & 
leverage  community  expertise 

-  Work  transition  strategies  and  plans 


http://www.  iarpa.gov 
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IARPA  Offices  and  Areas  of  Emphasis 
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safe  and  secure  operations 


incisive  analysis 


,  virtual 
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smart  collection 
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neuroscience?  cognition  computer 

r  .  .  .small-group  human  -  machine  operations 
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The  “Heilmeier  Questions” 


1 .  What  are  you  trying  to  do? 

2.  How  does  this  get  done  at  present?  Who  does  it?  What  are  the  limitations  of 
the  present  approaches? 

-  Are  you  aware  of  the  state-of-the-art  and  have  you  thoroughly  thought  through  all 
the  options? 

3.  What  is  new  about  your  approach?  Why  do  you  think  you  can  be  successful  at 
this  time? 

-  Given  that  you’ve  provided  clear  answers  to  1  &  2,  have  you  created  a  compelling 
option? 

-  What  does  first-order  analysis  of  your  approach  reveal? 

4.  If  you  succeed,  what  difference  will  it  make? 

-  Why  should  we  care? 

5.  How  long  will  it  take?  How  much  will  it  cost?  What  are  your  mid-term  and 
final  exams? 

-  What  is  your  program  plan?  How  will  you  measure  progress?  What  are  your 
milestones/metrics?  What  is  your  transition  strategy? 
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The  “P”  in  IARPA  is  very  important 


•  Technical  arid  programmatic  excellence  are  required 

•  Each  Program  will  have  a  clearly  defined  and  measurable  end-goal,  typically  3- 
5  years  out. 

-  Intermediate  milestones  to  measure  progress  are  also  required 

-  Every  Program  has  a  beginning  and  an  end 

-  A  new  program  may  be  started  that  builds  upon  what  has  been 
accomplished  in  a  previous  program,  but  that  new  program  must  compete 
against  all  other  new  programs 

•  This  approach,  coupled  with  rotational  PM  positions,  ensures  that... 

-  IARPA  does  not  “institutionalize”  programs 

-  Fresh  ideas  and  perspectives  are  always  coming  in 

-  Status  quo  is  always  questioned 

-  Only  the  best  ideas  are  pursued,  and  only  the  best  performers  are  funded. 
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Office  of  Incisive  Analysis 


“maximizing  insight  from  the  information  we  collect,  in  a  timely  fashion” 


Large  Data  Volumes 
and  Varieties 

V _ _ _ _ _ J 

f  > 

Social-Cultural  and 
Linguistic  Factors 

s. _ J 

! 

1 

f  \ 

Improving  Analytic 
Processes 

^  J 

Providing  powerful 
new  sources  of 
information  from 
massive,  noisy  data 

Analyzing  language  and 
speech  to  produce 
insights  into  groups  and 
organizations. 

Dramatic  enhancements 
to  the  analytic  process 
at  the  individual  and 
group  level. 

that  currently 
overwhelm  analysts. 
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Office  of  Smart  Collection 


“dramatically  improve  the  value  of  collected  data” 


( - \ 

Novel  Sources  of 
Information 

v _ _ _ ) 

Create  innovative 
technologies  and  tools 
for  reaching  hard 
targets  in  denied  areas 


/ - \ 

Identity  Intelligence 

V _ ) 

■  Detect  the  trustworthiness 
of  others 

■  Advance  biometrics  in 
real-world  conditions 


/ - \ 

Tracking  and  Locating 

v _ ) 

Accurately  locate  HF 
emitters  and  low-power, 
moving  emitters  with  a 
factor  of  ten  improvement 
in  geolocation  accuracy 
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Office  of  Safe  and  Secure  Operations 


“counter  emerging  adversary  potential  to  deny  our  ability  to  operate  effectively 
in  a  globally-interdependent  and  networked  environment” 


( - >1 

Computational 

Power 

V _ _ _ _ _ J 

/ - >1 

Trustworthy 

Components 

c  J 

f - \ 

Safe  and  Secure 
Systems 

L  J 

Revolutionary 
advances  in  science 
and  engineering  to 
solve  problems 

Getting  the  benefits  of 
leading-edge  hardware 
and  software  without 
compromising  security 

Safeguarding  mission 
integrity  in  a  hostile 
world 

intractable  with  today’s 
computers 


INTELLIGENCE  ADVANCED  RESEARCH  PROJECTS  ACTIVITY  (IARPA) 


39 


